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post-mortem item-examinee sampling investigation. Manipulated 
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and number of examinees responding to each subtest. Each 
item-examinee sampling procedure was replicated five times. Defining 
one observation as the score received by one examinee on one item, 
the results indicate that the mean of a normative distribution is 
easily and efficiently estimated with a relatively small number of 
observations; the variance, to the contrary is a more difficult 
parameter to approximate and requires a larger number of observations 
to obtain a reasonable efficient estimate. The results of this 
investigation support the conclusion that, in estimating parameters 
by item-examinee sampling, the variable of importance is not the 
item-examinee sampling procedure but is instead the number of 
observations obtained by that procedure. (Author) 
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ITEM-EXAMINEE SAMPLING PROCEDURES AND ASSOCIATED STANDARD ERRORS IN 
ESTIMATING TEST PARAMETERS 

David M. Shoemaker 



ABSTRACT 

Selected parameters for a negatively-skewed and a normally 
distributed normative distribution were estimated in a post-mortem 
item-examinee sampling investigation. Manipulated systematically 
were number of subtests, number of items per sub test, and number of 
examinees responding to each subtest. Each item-examinee sampling 
procedure was replicated five times. Defining one observation as the 
score received by one examinee on one item, the results indicate that 
the mean of a normative distribution is easily and efficiently esti- 
mated with a relatively small number of observations; the variance, 
to the contrary, is a more difficult parameter to approximate and 
requires a larger number of observations to obtain a reasonable effi- 
cient estimate. The results of this investigation support the con- 
clusion that, in estimating parameters by item-examinee sampling, the 
variable of importance is not the item-examinee sampling procedure but 
is instead the number of observations obtained by that procedure. 
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ITEM- EXAMINEE SAMPLING PROCEDURES AND ASSOCIATED STANDARD ERRORS IN 
ESTIMATING TEST PARAMETERS 



An important aspect of large-scale tryouts of criterion-referenced 
instructional programs is the collection of student achievement data 
indicating the effectiveness of the program. Collection of this data 
frequently involves individual administration of criterion-referenced 
tests — a procedure which is time-consuming and costly to implement with 
the entire tryout population in a large-scale tryout. However, accurate 
estimates of the population mean and variance can be obtained through 
item-examinee sampling, a much more economical procedure. The study 
described herein was conducted to investigate the utility of various 
item-examinee sampling procedures when used for group assessment with 
criterion-referenced instructional programs. 



Item-examinee Sampling 

l 

Item-examinee sampling is a procedure in which a set of K test 
items is subdivided into t sub tests of items and each sub test of 
items is administered to different subgroups of examinees selected 
from the testable population of N examinees. Although each examinee 
receives only a proportion of the complete set of items , the statis- 
tical model described by Lord (1960, 1962) permits the researcher to 
estimate the mean and variance of the total test score distribution 
which would have been obtained by testing N examinees on K items • 

To demonstrate the procedure and applicability of item-examinee sampling 
in educational research, consider the following situation: A 100-item 

comprehensive examination is to be administered to 5000 grade 1 
students at the end of a specific instructional program. The purpose 
of the examination is that of group assessment, not individual assess- 
ment. For various reasons, e.g., it is not economically feasible to 
administer the complete set of items to all examinees, the amount of 
testing time is prohibitive, the scoring costs are prohibitive, or the 
cooperation of individual schools could be more readily obtained if 
only a few minutes of each student’s time were required, item-examinee 
sampling is a desirable experimental procedure. One possible item- 
examinee sampling procedure which might be used in this situation is 
as follows: (a) The 100-item test is subdivided into five subtests 

each containing 20 items. Items are assigned to subtests by sampling 
at random and without replacement from the 100-item pool, (b) Each 
subtest is administered to three classes of examinees which have been, 
for each sub test, randomly selected without replacement from the pool 
of testable classes. In this particular procedure, approximately 450 
students would be tested (assuming 30 students per class) over 20 items; 
however, not all students would receive the same 20 items. The testing 
time per examinee would be approximately 1/5 of the time required to 
administer 100 items. 
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The mean and variance for the total test are estimated from subtest 
results by 

K 

M> i = X ± and 

kf 



a 2 2 

= mj K[(K - 1) Si - (K- ki)L p iqi ] 



ki(ki - 1) (mi - 1) 



where, referring to the i th subtest, 



K 


is 


the 


m i 


is 


the 


k i 


is 


the 


h 


is 


the 


2 

s i 


is 


the 


k i 


is 


the 



number of items in the complete test, 

number of examinees taking the subtest, 

number of subtest items, 

mean subtest score, 

variance of the subtest scores , and 

sum of the ki subtest item variances . 



A single estimate of |j, is obtained by averaging the t estimates 
of p, obtained from each item-examinee sample; a single estimate of 
a , by averaging the t estimates of the population variance. If 
the total number of examinees N is less than 500, the pooled estimate 
of is multiplied by (N -1)/N. 



Item-examinee sampling differs from item-sampling and from examinee- 
sampling. In item-sampling, a randomly selected subset of test items 
is administered to all examinees; in examinee-sampling, all items are 
administered to a randomly selected subgroup of examinees. Both item- 
sampling and item-examinee sampling procedures implicitly assume that 
examinee performance on an item does not depend on the context in which 
the item occurs. This is a critical assumption and must be evaluated 
carefully in each situation. It must be emphasized that item-examinee 
sampling is a group assessment procedure. 



Procedural Guidelines in Item-examinee Sampling 



While it is undeniably true that item-examinee sampling is an 
effective norming technique, few procedural guidelines are available 
to aid the researcher in determining the most appropriate number of 
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subtests, number of items per sub test, and number of examinees per 
subtest to use in an item-examinee sampling investigation. Shoemaker 
(1970), using a post-mortem item-examinee sampling paradigm, manipu- 
lated systematically the variables of number of subtests, number of 
items per subtest, and number of examinees responding to each subtest 
in determining the most appropriate procedure to use when estimating 
a normative distribution. Defining one observation as the score 
received by one examinee on one item, the results suggested that as 
the number of observations increased beyond 1,500 all item-examinee 
sampling procedures produce distributions stochastically equivalent 
to the normative distribution. Shoemaker concluded that, in esti- 
mating a norm distribution by item-examinee sampling, the variable 
of importance is not the item-examinee sampling procedure per se but 
is instead the number of observations obtained by that procedure. 



The investigation described herein was designed to isolate those 
factors which produced the Shoemaker (1970) results. Major consi- 
derations were as follows: (1) The distribution of test scores in 

the Shoemaker investigation was normal and it is possible that item- 
examinee sampling as a technique may be robust for normal distribu- 
tions. Distribution parameters should be estimated by a multitude 
of item-examinee sampling procedures when the normative distribution 
is not normal. (2) Results of 15 item-examinee sampling procedures 
were reported by Shoemaker. Each procedure produced a pooled estimate 
of the population mean p, and a pooled estimate of the population 
variance o^. While the procedure used in item-examinee sampling was 
not found to be a significant factor, one sampling procedure may be 
preferred to another if estimates of test parameters resulting from 
that procedure have less variance than corresponding estimates obtained 
from another procedure. Thus, standard errors of estimate per item- 
examinee sampling procedure (not computed in the Shoemaker investiga- 
tion) should be determined empirically for a wide variety of item- 
examinee sampling procedures. (3) In the majority of item-examinee 
sampling investigations, sampling of items has been exhaustive and 
without replacement, that is, all test items have appeared in the 
subtests and no item was included in more than one subtest. Lord and 
Novick (1968, p. 257) have indicated that failure to administer all 
test items inflates the standard error of estimating the population 
mean by item-examinee sampling. Furthermore, the smaller the number 
of items in the population, the worse the effect. A statement such 
as this is easily verified empirically and its generalizability to 
estimating the population variance should also be considered. It may 
be hypothesized that improved estimates of parameters are obtained if 
a particular item appears in more than one subtest. Considerations 
such as these served as the basis for the experimental manipulation 
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described herein. The specific parameters to be estimated were the 
mean |i, variance o , and Kuder-Richardson Formula 21 reliability 
coefficient for the normative distribution of total test scores. 



Method 



The research design was one of post-mortem item-examinee sampling: 
given a normative distribution, various item-examinee samples are 
randomly selected from this data base and used to estimate parameters 
of the distribution from which they have been sampled. The first 
(of two) normative distributions considered consisted of test scores 
received by 1,031 kindergarten students on a 20-item dichotomously- 
scored three-alternative multiple-choice criterion-referenced exami- 
nation administered during the fall of 1969 as part of the First-Year 
Communication Skills Program at the Southwest Regional Laboratory for 
Educational Research and Development (SWRL) . The descriptive statistics 
for this markedly negatively-skewed distribution are given in column 
3 in Table 1. 

The 36 item-examinee sampling procedures used are described in 
the first five columns of Table 2. Three levels of number of subtests 
(10, 5, 2), three levels of number of items per subtest (15, 10, 5), 
and four levels of number of examinees per subtest (120, 90, 60, 30) 
were manipulated systematically. The results obtained from each of 
the 36 item-examinee sampling procedures were replicated five times. 



Results I 



The results of the 36 item-examinee sampling procedures are given 
in columns 6 through 9 in Table 2. As all procedures are similar, 
only the procedure and results outlined in the first row of Table 2 
will be described in detail. In the first item-examinee sampling 
procedure, 10 subtests, each containing 15 items were formed. To 
have 15 items per subtest, items had to be sampled for each subtest 
with replacement (WR) from the 20 item population. Each subtest was 
administered to 30 examinees sampled without replacement (W0R) from 
the testable population of 1,031 examinees. Each item-examinee sampling 
procedure produced one pooled estimate of P< and one pooled estimate 
of . As each sampling procedure had been replicated five times, 
there were five estimate of P- and five estimates of • In the 
first item-examinee sampling procedure, the mean of the five estimates 
of was 17.571; the standard deviation of these five estimates 

(or the standard error of estimate in estimating the population mean 
associated with the first procedure) was .105. The mean of the five 
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TABLE 1 



DESCRIPTIVE STATISTICS FOR NORMAL AND SKEWED NORMATIVE DISTRIBUTIONS 



Test Score 




Frequency 


Normal Dlst 


. Skewed Dlst. 


0 


12 


0 


1 


14 


0 


2 


24 


1 


3 


36 


0 


4 


45 


0 


5 


42 


3 


6 


71 


5 


7 


55 


5 


8 


94 


2 


9 


90 


7 


10 


88 


19 


11 


91 


15 


12 


74 


29 


13 


84 


31 


14 


56 


40 


15 


55 


46 


16 


35 


48 


17 


22 


85 


18 


18 


168 


19 


16 


207 


20 


9 


320 


Number of Examinees 


1,031 


1,031 


Mean Test Score 


9.840 


17.543 


Variance of Test Scores 


18.889 


8.950 


KR21 


.774 


.799 
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estimates of was 9.612; their standard deviation was .497. Each 

replication involved 4500 = (10) (15) (30) observations. As indicated 

in the last column in Table 2, all test items were included in one or 
another of the 10 subtests. 

The results in Table 2 could be (and have been) rearranged in a 
number of ways; however, the most meaningful way appeared to be by 
number of observations. This has been done and is found in columns 
3 through 6 in Table 3. A graphic display of the same results is 
given in Figure 1. 

The estimate^of ^ and obtained in each replication was 

used to compute KR21 as an estimate of the KR21 obtained using 
parameters. The mean coefficient across replications and standard 
error per item-examinee sampling procedure are given in columns 2 
and 3 in Table 4. 



Degree of Skewness in Normative Distribution 



It is not unreasonable to hypothesize that these results may be 
due to the extreme degree of skewness in the normative distribution. 

Would results differ if a normal normative distribution of test scores 
on a 20-item test had been used in place of the skewed distribution? 

To answer questions such as this and, more directly, to investigate 
the effect of degree of skewness in the normative distribution on 
standard errors of item-examinee sampling procedures, all item-examinee 
sampling procedures were replicated using a normal normative distri- 
bution. 

Item scores for 1,031 examinees on a 20-item test were generated 
by a Monte Carlo approach such that the distribution of total test 
scores was normal with item difficulty indices (proportion of examinees 
answering each item correctly) approximately equal to .5 and the Kuder- 
Richardson Formula 21 reliability of the total test being .774. Descrip- 
tive statistics for the normal normative distribution are given in 
column 2 in Table 1. 

All item-examinee sampling procedures were repeated with the 
normal distribution serving as the normative distribution. The 
statistical analyses were identical to those reported for the skewed 
distribution case. 



Results II 



The results of the 36 item-examinee sampling procedures are given 
in columns 6 through 9 in Table 5. The results in Table 5 are 
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Number of item-examinee sailing procedures pooled. 
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NUMBER OF OBSERVATIONS 

Figure 1: Mean estimate and + one standard error of estimate for mean |x and 
variance o * as a function of the number of observations for the 
skewed distribution case. Standard errors of estimate are based 
on five replications. 
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TABLE 4 

MEAN KR21 COEFFICIENTS AND ASSOCIATED STANDARD ERRORS OF ESTIMATE 
PER ITEM-EXAMINEE SAMPLING PROCEDURE 
FOR SKEWED AND NORMAL NORMATIVE DISTRIBUTIONS 



Item-Examinee 

Sampling 

Procedure 


Skewed Distribution 


Normal Distribution 




KR21 SE (KR21) 


KR21 SE (KR21) 



1 


.807 


.006 


.774 


.019 


2 


.772 


.015 


.779 


.013 


3 


.833 


.020 


.771 


.034 


4 


.802 


.010 


.779 


.006 


5 


.791 


.016 


.763 


.021 


6 


.819 


.003 


.745 


.029 


7 


.801 


.006 


.771 


.006 


8 


.797 


.006 


.775 


.063 


9 


.813 


.017 


.773 


.011 


10 


.786 


.010 


.778 


.017 


11 


.789 


.020 


.778 


.010 


12 


.773 


.028 


.767 


.006 


13 


.783 


.032 


.778 


.028 


14 


.778 


.046 


.783 


.013 


15 


.787 


.045 


.771 


.025 


16 


.790 


.027 


.775 


.006 


17 


.789 


.014 


.771 


.028 


18 


.767 


.039 


.752 


.018 


19 


.785 


.020 


.773 


.011 


20 


.796 


.008 


.765 


.013 


21 


.789 


.026 


.778 


.025 


22 


.793 


.008 


.768 


.011 


23 


.802 


.013 


.779 


.017 


24 


.765 


.016 


.774 


.017 


25 


.762 


.034 


.767 


.023 
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TABLE 4 (Continued) 



MEAN KR21 COEFFICIENTS AND ASSOCIATED STANDARD ERRORS OF ESTIMATE 
PER ITEM-EXAMINEE SAMPLING PROCEDURE 
FOR SKEWED AND NORMAL NORMATIVE DISTRIBUTIONS 





Item-Examinee 










Sampling 


Skewed 


Distribution 


Normal 


Distribution 


Procedure 












A 


A 


A 


A 




KR21 


SE (KR21) 


KR21 


SE(KR21) 


26 


.773 


.115 


.687 


.116 


27 


.811 


.094 


.694 


.095 


28 


.767 


.029 


.779 


.025 


29 


.743 


.034 


.730 


.051 


30 


.732 


.058 


.752 


.051 


31 


.788 


.028 


.761 


.015 


32 


.765 


.044 


.767 


.024 


33 


.714 


.068 


.777 


.054 


34 


.775 


.031 


.760 


.016 


35 


.794 


.035 


.756 


.029 


36 


.768 


.020 


.782 


.027 


Norm 


.799 




.774 
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36 2 5 120 W0R W0R 9.727 19.618 .456 2.101 1200 

Norm 1 20 1,031 W0R W0R 9.840 18.889 20620 
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Interpreted In the same manner as those In Table 2. Results for KR21 
coefficients are given In columns A and 5 In Table A. 

The results In Table 5 have been rearranged and pooled according 
to number of observations and are given In columns 7 through 10 In 
Table 3. A graphic display of the same results is given in Figure 2. 



Discussion 



The most apparent difference in standard errors associated with 
item-examinee sampling procedures is the difference in magnitude be- 
tween the standard error in estimating the population mean and the 
standard error in estimating the population variance. Item-examinee 
sampling procedures are generally efficient in estimating p, and much 
less efficient in estimating O 2 , Indeed, it would seem that almost 
any procedure could be used in estimating p>. In view of these results, 
it is not surprising that all item-examinee sampling investigations in 
the literature have reported satisfactory estimates of the mean of the 
normative distribution. The degree of accuracy in estimating o 2 is 
most obviously a function of the number of observations. Parameters 
for both distributions can be estimated accurately given a large number 
of observations; the number of observations taken by the researcher 
should be determined by the choice of parameter to be estimated and the 
desired accuracy of the results. Results do not appear to be influenced 
significantly by degree of skewness in the normative distribution. 

For several of the item-examinee sampling procedures considered, 
e.g., number 1 in Table 2, identical items were included in more than 
one subtest. Other sampling procedures sampled items exhaustively and 
without replacement: all items were sampled and no item was included 

in more than one subtest. An example of this case is procedure 26 in 
Table 2. In some procedures, a number of items were excluded from all 
subtests as, for example, procedure 27 in Table 2. The effect of these 
sampling variations on the standard errors of estimate are most appro- 
priately interpreted in terms of number of observations : the greater 

the number of observations, the less the standard error of estimate. 

If the results in Tables 2 and 5 are individually rearranged (without 
averaging results over procedures having the same number of observations) 
and standard errors are examined as a function of number of items omit- 
ted, no trend is apparent. If failure to administer all test items 
does influence standard errors of estimate, perhaps the effect would 
have been more apparent if the number of replications per item-examinee 
sampling procedure had been significantly increased. 
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NUMBER OF OBSERVATIONS 

Figure 2: Mean estimate and + one standard error of estimate for mean |x and 
variance as a function of the number of observations for the 
normal distribution case. Standard errors of estimate are based 
on five replications. 
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In general, the results of this Investigation support the con- 
clusion that, In estimating parameters by Item-examinee sampling, 
the variable of Importance does not appear to be the Item-examinee 
sampling procedure but Is Instead the number of observations obtained 
by that procedure. All item-examinee sampling, item-sampling, and 
examinee-sampling investigations should report standard errors for 
each sampling procedure considered. The interpretation of results 
is greatly simplified in the light of standard errors of estimate 
per parameter. 
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